ROL_MajExp__taxprofiler__042225_1
A modular tool to aggregate results from bioinformatics analyses across many samples into a single report.
This report has been generated by the nf-core/taxprofiler analysis pipeline. For information about how to interpret these results, please see the documentation.
Report
generated on 2025-04-22, 21:26 PDT
based on data in:
/nfs3/Sharpton_Lab/prod/prod_restructure/projects/sielerjm/ZF__MajorExp/Databases/Deworming/Nextflow/work/ca/f69cb119076f80a0ee129468787b65
General Statistics
By default, all read count columns are displayed as millions (M) of reads.
| Sample Name | % Aligned | Error rate | Non-primary | Reads mapped | % Mapped | % Proper pairs | % MapQ 0 reads | Total seqs | Danio rerio | Top 5 species | Unclassified | Danio rerio | Top 5 species | Unclassified |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| TS047_RoL_RNA_16_TS047_RoL_RNA_16 | 75.3% | 2.21% | 0.0M | 35.7M | 75.3% | 56.7% | 6.4% | 47.4M | ||||||
| TS047_RoL_RNA_16_TS047_RoL_RNA_16_kraken2_custom_k2 | 96.2% | 96.2% | 3.8% | |||||||||||
| TS047_RoL_RNA_16_TS047_RoL_RNA_16_kraken2_custom_k2.bracken | 96.2% | 96.2% | 3.8% | |||||||||||
| TS047_RoL_RNA_177_TS047_RoL_RNA_177 | 72.3% | 2.27% | 0.0M | 34.9M | 72.3% | 51.9% | 6.5% | 48.2M | ||||||
| TS047_RoL_RNA_177_TS047_RoL_RNA_177_kraken2_custom_k2 | 96.3% | 96.3% | 3.7% | |||||||||||
| TS047_RoL_RNA_177_TS047_RoL_RNA_177_kraken2_custom_k2.bracken | 96.3% | 96.3% | 3.7% | |||||||||||
| TS047_RoL_RNA_20_TS047_RoL_RNA_20 | 73.6% | 2.27% | 0.0M | 41.9M | 73.6% | 54.1% | 6.6% | 56.9M | ||||||
| TS047_RoL_RNA_20_TS047_RoL_RNA_20_kraken2_custom_k2 | 96.5% | 96.5% | 3.5% | |||||||||||
| TS047_RoL_RNA_20_TS047_RoL_RNA_20_kraken2_custom_k2.bracken | 96.5% | 96.5% | 3.5% | |||||||||||
| TS047_RoL_RNA_291_TS047_RoL_RNA_291 | 72.6% | 2.26% | 0.0M | 36.4M | 72.6% | 52.8% | 6.3% | 50.2M | ||||||
| TS047_RoL_RNA_291_TS047_RoL_RNA_291_kraken2_custom_k2 | 95.2% | 95.2% | 4.8% | |||||||||||
| TS047_RoL_RNA_291_TS047_RoL_RNA_291_kraken2_custom_k2.bracken | 95.2% | 95.2% | 4.8% | |||||||||||
| TS047_RoL_RNA_328_TS047_RoL_RNA_328 | 72.0% | 2.27% | 0.0M | 43.0M | 72.0% | 52.5% | 6.6% | 59.7M | ||||||
| TS047_RoL_RNA_328_TS047_RoL_RNA_328_kraken2_custom_k2 | 94.2% | 94.2% | 5.8% | |||||||||||
| TS047_RoL_RNA_328_TS047_RoL_RNA_328_kraken2_custom_k2.bracken | 94.2% | 94.2% | 5.8% | |||||||||||
| TS047_RoL_RNA_355_TS047_RoL_RNA_355 | 73.3% | 2.14% | 0.0M | 29.1M | 73.3% | 56.9% | 6.3% | 39.7M | ||||||
| TS047_RoL_RNA_355_TS047_RoL_RNA_355_kraken2_custom_k2 | 82.7% | 82.7% | 17.3% | |||||||||||
| TS047_RoL_RNA_355_TS047_RoL_RNA_355_kraken2_custom_k2.bracken | 82.7% | 82.7% | 17.3% | |||||||||||
| TS047_RoL_RNA_46_TS047_RoL_RNA_46 | 64.6% | 2.72% | 0.0M | 33.2M | 64.6% | 40.4% | 12.4% | 51.5M | ||||||
| TS047_RoL_RNA_46_TS047_RoL_RNA_46_kraken2_custom_k2 | 98.9% | 98.9% | 1.1% | |||||||||||
| TS047_RoL_RNA_46_TS047_RoL_RNA_46_kraken2_custom_k2.bracken | 98.9% | 98.9% | 1.1% | |||||||||||
| TS047_RoL_RNA_477_TS047_RoL_RNA_477 | 72.8% | 2.24% | 0.0M | 41.4M | 72.8% | 53.1% | 6.4% | 56.8M | ||||||
| TS047_RoL_RNA_477_TS047_RoL_RNA_477_kraken2_custom_k2 | 95.8% | 95.8% | 4.2% | |||||||||||
| TS047_RoL_RNA_477_TS047_RoL_RNA_477_kraken2_custom_k2.bracken | 95.8% | 95.8% | 4.2% | |||||||||||
| TS047_RoL_RNA_498_TS047_RoL_RNA_498 | 72.6% | 2.20% | 0.0M | 40.7M | 72.6% | 53.4% | 6.4% | 56.0M | ||||||
| TS047_RoL_RNA_498_TS047_RoL_RNA_498_kraken2_custom_k2 | 94.0% | 94.0% | 6.0% | |||||||||||
| TS047_RoL_RNA_498_TS047_RoL_RNA_498_kraken2_custom_k2.bracken | 94.0% | 94.0% | 6.0% | |||||||||||
| TS047_RoL_RNA_526_TS047_RoL_RNA_526 | 68.7% | 2.47% | 0.0M | 51.1M | 68.7% | 46.5% | 9.4% | 74.4M | ||||||
| TS047_RoL_RNA_526_TS047_RoL_RNA_526_kraken2_custom_k2 | 97.7% | 97.7% | 2.3% | |||||||||||
| TS047_RoL_RNA_526_TS047_RoL_RNA_526_kraken2_custom_k2.bracken | 97.7% | 97.7% | 2.3% | |||||||||||
| TS047_RoL_RNA_532_TS047_RoL_RNA_532 | 72.2% | 2.26% | 0.0M | 38.0M | 72.2% | 51.8% | 6.4% | 52.6M | ||||||
| TS047_RoL_RNA_532_TS047_RoL_RNA_532_kraken2_custom_k2 | 96.4% | 96.4% | 3.6% | |||||||||||
| TS047_RoL_RNA_532_TS047_RoL_RNA_532_kraken2_custom_k2.bracken | 96.4% | 96.4% | 3.6% | |||||||||||
| TS047_RoL_RNA_655_TS047_RoL_RNA_655 | 68.6% | 2.42% | 0.0M | 38.4M | 68.6% | 46.1% | 8.5% | 56.0M | ||||||
| TS047_RoL_RNA_655_TS047_RoL_RNA_655_kraken2_custom_k2 | 97.3% | 97.3% | 2.7% | |||||||||||
| TS047_RoL_RNA_655_TS047_RoL_RNA_655_kraken2_custom_k2.bracken | 97.3% | 97.3% | 2.7% | |||||||||||
| TS047_RoL_RNA_708_TS047_RoL_RNA_708 | 71.2% | 2.24% | 0.0M | 48.0M | 71.2% | 51.1% | 6.3% | 67.5M | ||||||
| TS047_RoL_RNA_708_TS047_RoL_RNA_708_kraken2_custom_k2 | 95.8% | 95.8% | 4.2% | |||||||||||
| TS047_RoL_RNA_708_TS047_RoL_RNA_708_kraken2_custom_k2.bracken | 95.8% | 95.8% | 4.2% | |||||||||||
| TS047_RoL_RNA_710_TS047_RoL_RNA_710 | 73.1% | 2.17% | 0.0M | 38.7M | 73.1% | 54.1% | 5.9% | 53.0M | ||||||
| TS047_RoL_RNA_710_TS047_RoL_RNA_710_kraken2_custom_k2 | 95.3% | 95.3% | 4.7% | |||||||||||
| TS047_RoL_RNA_710_TS047_RoL_RNA_710_kraken2_custom_k2.bracken | 95.3% | 95.3% | 4.7% |
bowtie2
Results from both Bowtie 2 and HISAT2, tools for aligning reads against a reference genome.URL: http://bowtie-bio.sourceforge.net/bowtie2; https://ccb.jhu.edu/software/hisat2DOI: 10.1038/nmeth.1923; 10.1038/nmeth.3317; 10.1038/s41587-019-0201-4
Paired-end alignments
This plot shows the number of reads aligning to the reference in different ways.
There are 6 possible types of alignment:
- PE mapped uniquely: Pair has only one occurence in the reference genome.
- PE mapped discordantly uniquely: Pair has only one occurence but not in proper pair.
- PE one mate mapped uniquely: One read of a pair has one occurence.
- PE multimapped: Pair has multiple occurence.
- PE one mate multimapped: One read of a pair has multiple occurence.
- PE neither mate aligned: Pair has no occurence.
Samtools Stats
Toolkit for interacting with BAM/CRAM files.URL: http://www.htslib.orgDOI: 10.1093/bioinformatics/btp352
Percent mapped
Alignment metrics from samtools stats; mapped vs. unmapped reads vs. reads mapped with MQ0.
For a set of samples that have come from the same multiplexed library, similar numbers of reads for each sample are expected. Large differences in numbers might indicate issues during the library preparation process. Whilst large differences in read numbers may be controlled for in downstream processings (e.g. read count normalisation), you may wish to consider whether the read depths achieved have fallen below recommended levels depending on the applications.
Low alignment rates could indicate contamination of samples (e.g. adapter sequences), low sequencing quality or other artefacts. These can be further investigated in the sequence level QC (e.g. from FastQC).
Reads mapped with MQ0 often indicate that the reads are ambiguously mapped to multiple locations in the reference sequence. This can be due to repetitive regions in the genome, the presence of alternative contigs in the reference, or due to reads that are too short to be uniquely mapped. These reads are often filtered out in downstream analyses.
Alignment stats
This module parses the output from samtools stats. All numbers in millions.
Kraken
Taxonomic classification using exact k-mer matches to find the lowest common ancestor (LCA) of a given sequence.URL: https://ccb.jhu.edu/software/krakenDOI: 10.1186/gb-2014-15-3-r46
Top taxa
The number of reads falling into the top 5 taxa across different ranks.
To make this plot, the percentage of each sample assigned to a given taxa is summed across all samples. The counts for these top 5 taxa are then plotted for each of the 9 different taxa ranks. The unclassified count is always shown across all taxa ranks.
The total number of reads is approximated by dividing the number of unclassified reads by the percentage of
the library that they account for.
Note that this is only an approximation, and that kraken percentages don't always add to exactly 100%.
The category "Other" shows the difference between the above total read count and the sum of the read counts in the top 5 taxa shown + unclassified. This should cover all taxa not in the top 5, +/- any rounding errors.
Note that any taxon that does not exactly fit a taxon rank (eg. - or G2) is ignored.
Bracken
Estimates species abundances in metagenomics samples by probabilistically re-distributing reads in the taxonomic tree.URL: https://ccb.jhu.edu/software/krakenDOI: 10.7717/peerj-cs.104
ℹ️: plot title will say Kraken2 due to the first step of bracken producing the same output format as Kraken. Abundance information is currently not supported in MultiQC.Top taxa
The number of reads falling into the top 5 taxa across different ranks.
To make this plot, the percentage of each sample assigned to a given taxa is summed across all samples. The counts for these top 5 taxa are then plotted for each of the 9 different taxa ranks. The unclassified count is always shown across all taxa ranks.
The total number of reads is approximated by dividing the number of unclassified reads by the percentage of
the library that they account for.
Note that this is only an approximation, and that kraken percentages don't always add to exactly 100%.
The category "Other" shows the difference between the above total read count and the sum of the read counts in the top 5 taxa shown + unclassified. This should cover all taxa not in the top 5, +/- any rounding errors.
Note that any taxon that does not exactly fit a taxon rank (eg. - or G2) is ignored.
Software Versions
Software Versions lists versions of software tools extracted from file contents.
| Group | Software | Version |
|---|---|---|
| BOWTIE2_ALIGN | bowtie2 | 2.5.2 |
| pigz | 2.6 | |
| samtools | 1.18 | |
| BOWTIE2_BUILD | bowtie2 | 2.5.2 |
| KRAKEN2_KRAKEN2 | kraken2 | 2.1.3 |
| pigz | 2.8 | |
| KRAKENTOOLS_COMBINEKREPORTS_KRAKEN | combine_kreports.py | 1.2 |
| KRAKENTOOLS_KREPORT2KRONA | kreport2krona.py | 1.2 |
| KRONA_CLEANUP | sed | 4.7 |
| KRONA_KTIMPORTTEXT | krona | 2.8.1 |
| MINIMAP2_INDEX | minimap2 | 2.28-r1209 |
| SAMTOOLS_INDEX | samtools | 1.2 |
| Samtools Stats | samtools | 1.2 |
| TAXPASTA_MERGE | taxpasta | 0.7.0 |
| Workflow | Nextflow | 24.10.4 |
| nf-core/taxprofiler | v1.2.2 |
nf-core/taxprofiler Methods Description
Suggested text and references to use when describing pipeline usage within the methods section of a publication.URL: https://github.com/nf-core/taxprofiler
Methods
Data was processed using nf-core/taxprofiler v1.2.2 (doi: 10.1101/2023.10.20.563221) of the nf-core collection of workflows (Ewels et al., 2020), utilising reproducible software environments from the Bioconda (Grüning et al., 2018) and Biocontainers (da Veiga Leprevost et al., 2017) projects.
The pipeline was executed with Nextflow v24.10.4 (Di Tommaso et al., 2017) with the following command:
nextflow run /local/cqls/software/nextflow/assets/nf-core-taxprofiler_1.2.2/1_2_2 -profile singularity -config nextflow.config -resume -params-file nf-params.json
Tools used in the workflow included: Sequencing quality control with FastQC (Andrews 2010). Host read removal was performed for short reads with Bowtie2 (Langmead and Salzberg 2012) and SAMtools (Danecek et al. 2021). Host read removal was performed for long reads with minimap2 (Li et al. 2018) and SAMtools (Danecek et al. 2021). Taxonomic classification or profiling was carried out with: Bracken (Lu et al. 2017), Kraken2 (Wood et al. 2019). Visualisation of results, where supported, was performed with Krona (Ondov et al. 2011). Standardisation of taxonomic profiles was carried out with TAXPASTA (Beber et al. 2023). Pipeline results statistics were summarised with MultiQC (Ewels et al. 2016).
References
- Di Tommaso, P., Chatzou, M., Floden, E. W., Barja, P. P., Palumbo, E., & Notredame, C. (2017). Nextflow enables reproducible computational workflows. Nature Biotechnology, 35(4), 316-319. doi: 10.1038/nbt.3820
- Ewels, P. A., Peltzer, A., Fillinger, S., Patel, H., Alneberg, J., Wilm, A., Garcia, M. U., Di Tommaso, P., & Nahnsen, S. (2020). The nf-core framework for community-curated bioinformatics pipelines. Nature Biotechnology, 38(3), 276-278. doi: 10.1038/s41587-020-0439-x
- Grüning, B., Dale, R., Sjödin, A., Chapman, B. A., Rowe, J., Tomkins-Tinch, C. H., Valieris, R., Köster, J., & Bioconda Team. (2018). Bioconda: sustainable and comprehensive software distribution for the life sciences. Nature Methods, 15(7), 475–476. doi: 10.1038/s41592-018-0046-7
- da Veiga Leprevost, F., Grüning, B. A., Alves Aflitos, S., Röst, H. L., Uszkoreit, J., Barsnes, H., Vaudel, M., Moreno, P., Gatto, L., Weber, J., Bai, M., Jimenez, R. C., Sachsenberg, T., Pfeuffer, J., Vera Alvarez, R., Griss, J., Nesvizhskii, A. I., & Perez-Riverol, Y. (2017). BioContainers: an open-source and community-driven framework for software standardization. Bioinformatics (Oxford, England), 33(16), 2580–2582. doi: 10.1093/bioinformatics/btx192
- Stamouli, S., Beber, M. E., Normark, T., Christensen, T. A., Andersson-Li, L., Borry, M., Jamy, M., nf-core community, & Fellows Yates, J. A. (2023). nf-core/taxprofiler: Highly parallelised and flexible pipeline for metagenomic taxonomic classification and profiling. (Preprint). bioRxiv 2023.10.20.563221. doi: 10.1101/2023.10.20.563221
- Andrews S. (2010) FastQC: A Quality Control Tool for High Throughput Sequence Data, URL: https://www.bioinformatics.babraham.ac.uk/projects/fastqc/
- Langmead, B., & Salzberg, S. L. (2012). Fast gapped-read alignment with Bowtie 2. Nature Methods, 9(4), 357–359. 10.1038/nmeth.1923
- Li, H. (2018). Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics , 34(18), 3094–3100. 10.1093/bioinformatics/bty191
- Danecek, P., Bonfield, J. K., Liddle, J., Marshall, J., Ohan, V., Pollard, M. O., Whitwham, A., Keane, T., McCarthy, S. A., Davies, R. M., & Li, H. (2021). Twelve years of SAMtools and BCFtools. GigaScience, 10(2). 10.1093/gigascience/giab008
- Lu, J., Breitwieser, F. P., Thielen, P., & Salzberg, S. L. (2017). Bracken: estimating species abundance in metagenomics data. PeerJ. Computer Science, 3(e104), e104. 10.7717/peerj-cs.104
- Wood, D. E., Lu, J., & Langmead, B. (2019). Improved metagenomic analysis with Kraken 2. Genome Biology, 20(1), 257. 10.1186/s13059-019-1891-0
- Ondov, B. D., Bergman, N. H., & Phillippy, A. M. (2011). Interactive metagenomic visualization in a Web browser. BMC Bioinformatics, 12(1), 385. 10.1186/1471-2105-12-385
- Beber, M. E., Borry, M., Stamouli, S., & Fellows Yates, J. A. (2023). TAXPASTA: TAXonomic Profile Aggregation and STAndardisation. Journal of Open Source Software, 8(87), 5627. 10.21105/joss.05627
- Ewels, P., Magnusson, M., Lundin, S., & Käller, M. (2016). MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics , 32(19), 3047–3048. 10.1093/bioinformatics/btw354.
Notes:
-
(doi: 10.1101/2023.10.20.563221)
- The command above does not include parameters contained in any configs or profiles that may have been used. Ensure the config file is also uploaded with your publication!
- You should also cite all software used within this run. Check the "Software Versions" of this report to get version information.
nf-core/taxprofiler Workflow Summary
- this information is collected when the pipeline is started.URL: https://github.com/nf-core/taxprofiler
Input/output options
- databases
- /nfs3/Sharpton_Lab/prod/prod_restructure/projects/sielerjm/ZF__MajorExp/Databases/Deworming/Nextflow/database_sheet.csv
- sielerjm@oregonstate.edu
- input
- /nfs3/Sharpton_Lab/prod/prod_restructure/projects/sielerjm/ZF__MajorExp/Databases/Deworming/Nextflow/filtered_samplesheet.csv
- multiqc_title
- ROL_MajExp__taxprofiler__042225_1
- outdir
- /nfs3/Sharpton_Lab/prod/prod_restructure/projects/sielerjm/ZF__MajorExp/Transcriptomics/Results/taxprofiler
Preprocessing general QC options
- skip_preprocessing_qc
- true
Preprocessing host removal options
- hostremoval_reference
- /nfs3/Sharpton_Lab/prod/prod_restructure/projects/sielerjm/ZF__MajorExp/Databases/NCBI/Zebrafish/ZF_genome.fa
- perform_longread_hostremoval
- true
- perform_shortread_hostremoval
- true
Profiling options
- bracken_save_intermediatekraken2
- true
- run_bracken
- true
- run_kraken2
- true
Postprocessing and visualisation options
- run_krona
- true
- run_profile_standardisation
- true
- taxpasta_add_name
- true
- taxpasta_add_rank
- true
- taxpasta_ignore_errors
- true
- taxpasta_taxonomy_dir
- /nfs3/Sharpton_Lab/prod/prod_restructure/projects/sielerjm/ZF__MajorExp/Databases/Deworming/kraken2_custom_k2/taxonomy
Generic options
- trace_report_suffix
- 2025-04-22_23-28-42
Core Nextflow options
- configFiles
- N/A
- containerEngine
- singularity
- launchDir
- /nfs3/Sharpton_Lab/prod/prod_restructure/projects/sielerjm/ZF__MajorExp/Databases/Deworming/Nextflow
- profile
- singularity
- projectDir
- /local/cqls/software/nextflow/assets/nf-core-taxprofiler_1.2.2/1_2_2
- runName
- modest_becquerel
- userName
- sielerjm
- workDir
- /nfs3/Sharpton_Lab/prod/prod_restructure/projects/sielerjm/ZF__MajorExp/Databases/Deworming/Nextflow/work